Santa Barbara's nature-based 'Sense of Place', as told by twitter

Summary

The aim of this project was to evaluate whether or not geotagged social media data can be useful in providing insight into a region’s “Sense of Place” using Santa Barbara as a case study.

Sense of Place can be defined as the connection people feel to their geographic surroundings, including both the natural and built environment. Locations with a strong sense of place often have a strong identity felt by both locals and visitors.

Findings

Not surprisingly, tourists and locals both tweet about nature. Tourists tweet about nature more (X%), but stick to the popular tourist sites in town including the wharf, waterfront, zoo, santa barbara bowl and more. Santa barbara locals are also found at these sites just not as in high a proportion. Natural areas that are further from the downtown

There is significant overlap in tourist and local patterns within the downtown area, indicating that tourists and locals alike share a fondness for the same areas and things.

Geotagged social media data in conservation

Geotagged social media data has been used in recent years to study people’s interaction with the natural environment in various ways, many of which are focused on tourism:

  • Quantifying nature-based tourism (Wood et al. 2013, Kim et al. 2019)
  • Mapping tourist footprints (Runge & Daigle 2020), flows (Chua et al. 2016), and hot spots (Garcia-Palomares et al. 2015)
  • Understand tourist preferences in nature based places such as Kruger National Park (Hausmann et al. 2017, Levin et al. 2017, Tenkanen et al. 2017)
  • Monitor and measure environmental conditions of places (e.g. Great Barrier Reef, Becken et al. 2017)

This project differs in that I wanted to map the spatial patterns of tourists and locals, and understand how these two user groups engage with and perceive the natural environment of Santa Barbara.

It also gave me a chance to learn new text mining tools.

Why Santa Barbara?

The easy answer - I live here! Since I know the city and surrounding areas rather well, I could quickly look at spatial patterns and understand what exists at that location. The total number of tweets coming from Santa Barbara is also manageable compared to a much larger urban city.

Also, Santa Barbara is known for being a tourist town, and having beautiful natural and built landscapes (ok - I might be a bit biased here). Santa Barbara sits between the mountains and the ocean just 1.5 hours north of LA and has excellent recreation, dining, entertainment options. It’s no surprise that a lot of UCSB students end up sticking around after graduation, myself included 🙋‍♀️.

Finding the data

Going into this project, I thought that twitter data would be easily accessibly based on the number of different projects I had been seeing that used Twitter data and related R packages. But I quickly learned that this was not the case and Twitter only allows free public access to past 9 days of tweets. This was a problem since we wanted all tweets from January 1, 2015 - December 31, 2019.

Twitter data was obtained freely through an established partnership between UCSB Library and Crimson Hexagon. Before downloading, the data was queried to meet the following conditions:

  1. Tweet came from the Santa Barbara area
  2. Only original tweets (no retweets)
  3. Date was marked between January 1, 2015 and December 31, 2019

Crimson Hexagon only allows 10,000 randomly selected tweets to be exported, manually, at a time in .xls format. Due to this restriction, data was manually downloaded for every 2 days in order to capture all tweets (😓). This took a significant amount of point and click time as you can imagine!

Once downloaded, the twitter data did not contain all desired information, including whether or not the tweet was geotagged which was vital to this project. To get this information I stepped outside of my R comfort zone and used the python twarc library. This library can be used to “rehydrate” twitter data using individual tweet ids, and then store all associated tweet information as .json files. From here I was able to remove all tweets that did not have a geotag, giving a total of 79,981 tweets.

Twitter data

Here is a sample of the tweet data:

Month Day Time Year full_text user_location retweet_count favorite_count month_num date
Jun 21 03:53:25 2018 Another local Cali craft brew is a personal prerequisite. - Drinking a Cali Common by Topa Topa Brewing Company @ Topa Topa Brewing Company — https://t.co/wwe8uP8Yxc #photo Greater Cincinnati 0 0 6 2018-06-21
Oct 17 02:48:14 2015 Many people wonder how our little girl is only 1 years old going on 2. #toddler #babysigngraduate. @… https://t.co/LiuTon0r17 Santa Barbara, CA 0 1 10 2015-10-17
Oct 27 14:28:07 2016 current weather in Santa Barbara: overcast clouds, 63°F 90% humidity, wind 6mph, pressure 1023mb Santa Barbara, CA 0 1 10 2016-10-27
Oct 3 18:35:16 2015 it’s hard to stay mad at the world when there are such beautiful people… https://t.co/Vit4vupzO5 California, USA 0 0 10 2015-10-03
Oct 12 17:38:00 2017 when you find “the one”💙| brielleviator @ Backyard Bowls https://t.co/pxmKycXxw3 Santa Barbara and Los Angeles 0 0 10 2017-10-12
Jun 8 03:11:41 2016 Stopped for a flight of their truly delicious Sauv Blanc! #santabarbarawinecountry… https://t.co/yqUqCYU1x6 Las Vegas 0 0 6 2016-06-08
Jul 29 04:33:24 2018 🗓My Days JULY/28/2018🗓 🗝" #SAYITAINTSO #SB "🗝 #IJS #ImALLLove #Magician #ALLLove ____________________________________ #RecordingArtist #LifestyleBlogger #BrandAmbassador #Enlightened.… https://t.co/aFTU51bHDu Cerritos, CA 0 0 7 2018-07-29

Tweets over time

Almost immediately after plotting tweets over time you can see that the total number of geotagged tweets is going down over time. Most noticeably, there is a significant drop in tweets at the end of April, 2015. It seems this is due “a change in Twitter’s ‘post Tweet’ user-interface design results in fewer Tweets being geo-tagged” ( source). The first 4 months of 2015 have 15,720 tweets, or roughly 19% of all tweets. To reduce a skew in the data and remove geotagged tweets that may have been geotagged without knowledge by the user in those months, I moved forward with all tweets from May 1, 2015 through the end of 2019.

Tweet map

The spatial distribution of tweets highlights areas of higher population density and tourist areas in downtown Santa Barbara.

There is a single coordinate that has over 11,000 tweets reported across all years. It is near De La Vina between Islay and Valerio. There is nothing remarkable about this site so I assume it is the default coordinate when people tag “Santa Barbara” generally. The coordinate is 34.4258, -119.714.

As you zoom in on the map, clusters will disaggregate. You can click on blue points to see the tweet.

Trying basic leaflet map

Defining tourists & locals

This project aimed to understand if and how preferences differ between tourists and locals for nature-based places within the Santa Barbara area. In order to test this I needed to come up with a way to identify tourists or locals. I ended up using a two step process:

  1. If the user has self-identified their location as somewhere in the Santa Barbara area, they are designated a local. This includes Carpinteria, Santa Barbara, Montecito, Goleta, Gaviota and UCSB
  2. For the remainder, we use the number of times they have tweeted from Santa Barbara within a year to designate user type. If someone has tweeted across more than 2 months in the same year from Santa Barbara, they are identified as a local. This is consistent with how Eric Fischer determined tourists in his work.

This is not fool-proof and there are definitely instances where people visit and tweet from Santa Barbara more than two months a year, especially if they are visiting family or live within a couple hours driving distance, but without more data (and time) to determine where “tourists” truly live, this will have to do.

There are 21811 tweets from tourists and 45420 tweets from locals (32% and 68%). There are 12460 unique tourists and just 1893 unique local users.